impl(o11y): introduce error attributes#12189
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refines the tracing telemetry by introducing dedicated attributes for error types, exception types, and status messages. These additions provide a more detailed and standardized way to categorize and understand failures occurring within client-side operations, improving observability and debugging capabilities. The changes ensure that critical error information is consistently captured in OpenTelemetry spans, offering clearer insights into the root causes of issues. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new ErrorTypeUtil class to provide a standardized way of classifying exceptions into specific error types (e.g., client timeout, connection error, authentication error) for OpenTelemetry tracing. It adds new observability attributes (error.type, exception.type, status.message) and integrates this error classification into the SpanTracer to enrich span data upon failed attempts. The changes also include comprehensive unit and integration tests to validate the new error type extraction and tracing functionality. Feedback from the review includes correcting incorrect copyright years, removing redundant semicolons and toString() overrides, and updating Javadoc for accuracy.
sdk-platform-java/gax-java/gax/src/main/java/com/google/api/gax/tracing/ErrorTypeUtil.java
Show resolved
Hide resolved
sdk-platform-java/gax-java/gax/src/test/java/com/google/api/gax/tracing/ErrorTypeUtilTest.java
Show resolved
Hide resolved
...va-showcase/gapic-showcase/src/test/java/com/google/showcase/v1beta1/it/ITOtelErrorType.java
Outdated
Show resolved
Hide resolved
sdk-platform-java/gax-java/gax/src/main/java/com/google/api/gax/tracing/ApiTracer.java
Show resolved
Hide resolved
sdk-platform-java/gax-java/gax/src/main/java/com/google/api/gax/tracing/ErrorTypeUtil.java
Outdated
Show resolved
Hide resolved
sdk-platform-java/gax-java/gax/src/main/java/com/google/api/gax/tracing/ErrorTypeUtil.java
Outdated
Show resolved
Hide resolved
This reverts commit 0ce71dd.
…-attr-error-type-transfer
sdk-platform-java/gax-java/gax/src/main/java/com/google/api/gax/tracing/ErrorTypeUtil.java
Show resolved
Hide resolved
GraalVM failures seem unrelated |
diegomarquezp
left a comment
There was a problem hiding this comment.
Comments addressed
sdk-platform-java/gax-java/gax/src/main/java/com/google/api/gax/tracing/ErrorTypeUtil.java
Show resolved
Hide resolved
|
|
||
| public class ErrorTypeUtil { | ||
|
|
||
| public enum ErrorType { |
|
|
||
| private String extractErrorMessage(Throwable error) { | ||
| Throwable cause = error; | ||
| while (cause != null) { |
There was a problem hiding this comment.
Thanks for the catch. Done.
…-attr-error-type-transfer
…s://github.com/googleapis/google-cloud-java into observability/tracing-attr-error-type-transfer
This reverts commit e665b6e.
| endAttempt(); | ||
| } | ||
|
|
||
| private String extractErrorMessage(Throwable error) { |
There was a problem hiding this comment.
Yes, it was there before a refactor. Thanks for the catch. Removing.
| Status.newBuilder().setCode(com.google.rpc.Code.UNAVAILABLE.ordinal()).build()) | ||
| .build(); | ||
|
|
||
| assertThrows(UnavailableException.class, () -> client.echo(echoRequest)); |
There was a problem hiding this comment.
Is this from the interceptor or the error in EchoRequest? I see that UNAVAILABLE is set on both places.
There was a problem hiding this comment.
setCode(com.google.rpc.Code.UNAVAILABLE.ordinal()) should not be here. We should only rely on the interceptor.
| if (t.getCause() == null) { | ||
| return t; | ||
| } | ||
| if (t instanceof ExecutionException || t instanceof UncheckedExecutionException) { |
There was a problem hiding this comment.
I guess there is a concrete use case for this scenario?
There was a problem hiding this comment.
Confirmed only one exception should be unwrapped and added documentation both in javadoc and this PR's description.
…-attr-error-type-transfer
This PR implements error type recording on attempt spans for improved observability in `gax-java`, addressing requirements for better failure analysis. #### Key Changes * **`ErrorTypeUtil` Class**: This new utility class classifies errors based on a defined priority to populate observability attributes. ### Error Classification Priority The extraction logic determines the error type based on the following priority: 1. **`google.rpc.ErrorInfo.reason`**: If the error response from the service includes `ErrorInfo` details, the reason field (e.g., `RATE_LIMIT_EXCEEDED`) is used. 2. **Server Status Code**: If no reason is available, it checks for a server status code. For HTTP, this is the numeric status code (e.g., `403`, `503`). For gRPC, this is the status code name (e.g., `PERMISSION_DENIED`, `UNAVAILABLE`). 3. **Client-Side Network/Operational Errors**: If it's a client-side failure, it maps common exceptions to specific enum representations (e.g., `CLIENT_TIMEOUT`, `CLIENT_CONNECTION_ERROR`). 4. **Language-specific error type**: Falls back to the class simple name of the exception (e.g., `NullPointerException`). 5. **Internal Fallback**: Defaults to `INTERNAL` if no other classification applies. ### Exceptions to be Unwrapped We investigated standard execution wrappers to ensure accurate error classification in `ErrorTypeUtil`. We only found one exception so far that needs unwrapping in this context. #### `UncheckedExecutionException` Occurs in [ServerStreamIterator.java](https://github.com/googleapis/sdk-platform-java/blob/main/gax-java/gax/src/main/java/com/google/api/gax/rpc/ServerStreamIterator.java) when wrapping checked exceptions observed during stream iteration. ```java if (last instanceof Throwable) { Throwable throwable = (Throwable) last; throw new UncheckedExecutionException(throwable); } ``` It is also thrown in [ApiExceptions.java](https://github.com/googleapis/sdk-platform-java/blob/main/gax-java/gax/src/main/java/com/google/api/gax/rpc/ApiExceptions.java) during synchronous call translation: ```java public static <ResponseT> ResponseT callAndTranslateApiException(ApiFuture<ResponseT> future) { try { return Futures.getUnchecked(future); } catch (UncheckedExecutionException exception) { if (exception.getCause() instanceof RuntimeException) { // ... } throw exception; } } ```
This PR implements error type recording on attempt spans for improved observability in
gax-java, addressing requirements for better failure analysis.Key Changes
ErrorTypeUtilClass: This new utility class classifies errors based on a defined priority to populate observability attributes.Error Classification Priority
The extraction logic determines the error type based on the following priority:
google.rpc.ErrorInfo.reason: If the error response from the service includesErrorInfodetails, the reason field (e.g.,RATE_LIMIT_EXCEEDED) is used.403,503). For gRPC, this is the status code name (e.g.,PERMISSION_DENIED,UNAVAILABLE).CLIENT_TIMEOUT,CLIENT_CONNECTION_ERROR).NullPointerException).INTERNALif no other classification applies.Exceptions to be Unwrapped
We investigated standard execution wrappers to ensure accurate error classification in
ErrorTypeUtil. We only found one exception so far that needs unwrapping in this context.UncheckedExecutionExceptionOccurs in ServerStreamIterator.java when wrapping checked exceptions observed during stream iteration.
It is also thrown in ApiExceptions.java during synchronous call translation: